RANDOM MATRICES: 
SHARP CONCENTRATION OF EIGENVALUES 



TERENCE TAO AND VAN VU 



Abstract. Let W n = -4=M n be a Wigner matrix whose entries have van- 
ishing third moment, normalized so that the spectrum is concentrated in the 
interval [—2, 2]. We prove a concentration bound for Nj = Nj(W n ), the num- 
ber of eigenvalues of W n in an interval /. 

Our result shows that jVj decays exponentially with standard deviation 
at most 0(log°( 1 ^ n). This is best possible up to the constant exponent in 
the logarithmic term. As a corollary, the bulk eigenvalues are localized to an 
interval of width 0(log°' 1 -' n/n); again, this is optimal up to the exponent. 
These results strengthen recent results of Erdos, Yau and Yin (under the extra 
assumption of vanishing third moment). 

Our proof is relatively simple and relies on the Lindeberg replacement ar- 
gument. 



1. Introduction 



The purpose of this paper is to sharpen the existing bounds on the eigenvalue 
counting function Nj = Nj(W n ) of a (normalized) Wigner matrix W n — ^=M n , 
and related quantities such as the Stieltjes transform sw n ( z ) and individual eigen- 
values Xi(W n ). Let us first state the Wigner random matrix model which we will 
use. 

Definition 1 (Wigner matrices). Let n > 1 be an integer (which we view as a 
parameter going off to infinity; in particular, n is understood to be large enough 
that quantities such as log log n are well-defined and positive) . An n x n Wigner 
matrix M n is defined to be a random Hcrmitian n x n matrix M n = (£ij)i<i t j< n , 
in which the £y for 1 < i < j < n are jointly independent with = ^ (in 
particular, the are real- valued) . For 1 < i < j < n, we require that the £y have 
mean zero and variance one, while for 1 < i = j < n we require that the ^ (which 
are necessarily real) have mean zero and variance a 2 for some a 2 > independent 
of i,j, n. For simplicity, we will also assume that for each 1 < i < j < n, the real 
and imaginary parts Re£y, Im^- are independent. We refer to the distributions 
Re£y , Im£jj for 1 < i < j < n and ^ for 1 < i < n as the atom distributions of 
M n , and view them as fixed while n goes off to infinity. 
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We say that the Wigner matrix ensemble obeys Condition CO if we have the 
exponential decay condition 

(1) P(|6;l>* C )<e-* 

for all 1 < i, j < n and t > C , and some constants C, C (independent of i, j, n). 

Two Wigner matrices M„ = (£y)i<i,j<n an d M' n = {£'ij)i<i,j<n are said to have 
matching moments to order m for some m > if one has 

(2) ERe(&) fc M&) ! = ERe^M^)' 

for all 1 < i, j < n and all natural numbers k, I > with k + I < m. As we are 
assuminhg the real and imaginary parts to be independent, this condition simplifies 
to the conditions 

(3) ERe(& ) k = ERe(^ ) k ; EIm(^) & - EIm(^- ) k 

for all 1 < i,j < n and all < fc < m. If we only require ([2]) or ([3]) to hold in the 
off-diagonal case i ^ j (resp. in the diagonal case i = j), we say that M n and M' n 
match moments to order m off the diagonal (resp. on the diagonal). 



We observe four basic examples of Wigner matrices: 

• In the Gaussian Unitary Ensemble (GUE), £y = iV(0, l)c is the standard 
complex gaussian random variable for 1 < i < j < n, £a = N(0, 1)r is the 
standard real gaussian random variable for 1 < i < n, and a 2 = 1. 

• In the Gaussian Orthogonal Ensemble (GOE) £y = iV(0, 1)r is the standard 
real gaussian random variable for 1 < i < j < n, = ^(0, 2)g is a slightly 
rescaled real gaussian random variable for 1 < i < n, and a 2 = 2. 

• In the symmetric Bernoulli ensemble, ^ equals +1 with probability 1/2 
and —1 with probability 1/2 for all 1 < i,j < n, and a 2 = 1. 

• In the complex Hermitian Bernoulli ensemble, Re^j , Im£y for 1 < i < j < 
n and for 1 < i < n all equal +1 with probability 1/2 and —1 with 
probability 1/2, and a 2 = 1. 

Remark 2. Note that we do not require the off-diagonal 1 < i < j ' < n 
(or the diagonal 1 < i < n) to be identically distributed. This lack of an 
identical distribution hypothesis will be convenient when we apply the Lindeberg 
exchange strategy [27 , in which one Wigner matrix is compared to another one 
by exchanging the entries of the former matrix with the latter on^H at a time. As 
such, the intermediate stages of this exchange process need not have identically 
distributed entries, even if the initial and final matrices do. 

The hypothesis of independence of real and imaginary parts is imposed purely 
to simplify the exposition, and can easily be removed at the cost of some more 
complicated notation; in particular, the simpler moment matching condition ([3]) 
must be replaced by the more complicated condition ([2]). See Remark l23l 



More precisely, we exchange the diagonal entries one at a time, and the off-diagonal entries 
two at a time, in order to preserve the Hermitian property throughout. 
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In this paper, we will mostly deal with the (coarse-scale) normalization W n := 
of M n of the Wigner matrix, and more specifically with the eigenvalue 
counting function 

Nj = Ni(W n ) := |{1 < i < n : X l {W n ) G /}| 

of this matrix for various intervals I C R, where Ai(W n ) < . . . < X n (W n ) denote 
the (necessarily) real eigenvalues of the (Hermitian) matrix W n . 

The well-known Wigner semicircle law describes the bulk behavior of the counting 
function iVj of a Wigner matrix in terms of the semicircular distribution p sc (x) dx, 
where p sc : R — > R is the function 

p sc (x) : = i-(4- a ; 2 )f. 

Theorem 3 (Semicircular law). Let M n be a Wigner Hermitian matrix obeying 
Condition CO. Then for any fixed interval I (independent of n), one has 

lim -Nj(W n )= I Psc(y) dy 

n->oo n J j 

in the sense of probability. 



See for instance [4] for a proof of this theorem and for historical background. 
Condition CO can be omitted from this law, but we retain the hypothesis as it will 
be needed for the subsequent results discussed below. 

If we use o(x) to denote a quantity that goes to zero as n — > oo after dividing by 
x, we can reformulate Theorem [3] as the assertion that the asymptotic 

(4) Ni{W n ) =nj^p sc (y) dy + o(n) 

holds with probability 1 — o(l) for each fixed /. 

One can also phrase the semicircular law in terms of the individual eigenvalues 
Ai(W n ). If for each 1 < i < n we define the classical location 7^ of the normalised 
i th eigenvalue by the formula 

(5) / p sc (x)dx = -. 

J-00 n 

then the Wigner semicircular law (combined with an almost sure bound of (2 + 
o(l))y/n for the operator norm of M n , due to Bai and Yin [5]) is equivalent to the 
assertion that one has 

(6) A i (W„)= 7i + o(l) 
for any given 1 < i < n, with probability 1 — o(l). 

In this paper we investigate sharper versions of the semicircular law (known in 
the literature as local semicircular laws), which improve upon the error terms and 
failure probabilities in (j4]) and (j6]), and in which the interval / is now allowed to 
depend on n. 
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We first discuss the case of the Gaussian Unitary Ensemble (GUE), which is the 
most well-understood case, as the joint distribution of the eigenvalues is given by 
a determinantal point process. Because of this, it is known that for any interval J, 
the random variable Nj(W n ) in the GUE case obeys a law of the form 

oo 

(7) Nj(W n )^Y,^ 

z=l 

where the t]i = r]i t n,i ar e jointly independent indicator random variables (i.e. they 
take values in {0,1}); see e.g. O Corollary 4.2.24]. The mean and variance of 
Ni(W n ) can also be computed in the GUE case with a high degree of accuracy: 

Theorem 4 (Mean and variance for GUE). Let M n be drawn from GUE, let W n := 
^Af n , and let I = [—oo,x] for some real number x (which may depend on n). Let 
e > be independent of n. 

(i) (Bulk case) If x € [—2 + e, 2 — e], then 

EJVj(Wn) = n / Psc (y) dy + O(-^-t). 

Ji n 

(ii) (Edge case) If x G [—2,2], then 

ENj(W n )=n^p sc (y) dy + 0(l). 

(hi) (Variance bound) If one has x G [— 2,2 — e] and n 2 ' 3 (2 + x) — > oo as n ^ oo, 
one /ias 

VariN^Wn) = (-^ + °(1)) Iog(»(2 + xf' 2 ). 
In particular, one has VarJVj(W n ) = O(logn) in this regime. 

Here of course we use X = 0(Y), I < 7 or F > I to denote the estimate 
\X\ < CY for some quantity C independent of n. We will also use c to denote 
various small positive constants c > independent of n (but possibly depending on 
the constants in Condition CO). 

Proof. See [TH Lemmas 2.1, 2.2, 2.3]. Note that the normalization conventions 
in [TO] differ by a factor of y2 from the ones used here. Also, the asymptotic in 
the statement of [El Lemma 2.2] is only accurate (with the 0(1) error term) for t 
sufficiently close to 1, and more precisely for t = 1 — 0(n~ 2 / 5 ) (or, in our notation, 
x = —2 + 0(n~ 2 / 5 )), as it implicitly relies on the approximation n fjp sc (y) dy — 
j-(2 + xf/ 2 + 0(1) (as written in our notation), which is only valid in this regime. 

□ 

By combining these estimates with a well-known inequality of Bennett [B], we 
obtain a concentration estimate for Nj(W n ) in the GUE case: 
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Corollary 5 (Concentration for GUE). Let M n be drawn from GUE, let W, 
-pM„, and let I be an interval. Then one has 

V" 

P(|JVj(W n ) -nj^p sc (y) dy\ > T) « cxp(-cT) 

for all T > logn. 



Proof. By the triangle inequality we may take Z = [—00, x] for some real number 
x. As p sc is supported on [—2,2] and has total mass 1, we see (using the trivial 
bounds < Ni(W n ) < n and Ni(W n ) < Nj(W n ) whenever I C J) that without 
loss of generality we may assume a; € [—2,2]. By ([7]) and Theorem |H Nj(W n ) is 
then the sum of independent indicator functions whose mean p and variance a 2 are 
given by 

A* = nj p sc (y) dy + 0{l) 

and <7 2 = O(logn) respectively. Bennett's inequality (see [BJ, or [22 p.29]) then 
asserts that 

P(|JVi(W n ) - Ml > *) < 2exp(-a 2 0(^)) 

where 0(x) := (1 + a;)log(l + x) — x. Since cf>{x) 3> a; when a; 3> 1, the claim 
follows. □ 



Let us say that an event holds with overwhelming probability if it occurs with 
probability 1 — 0(n~ A ) for each fixed A. From the above corollary we see in 
particular that in the GUE case, one has 

N I {W n ) = n J^ Psc (y) dy + 0(log 1+o(1) n) 

with overwhelming probability for each fixed J, and an easy union bound argument 
(ranging over all intervals I in, say, [—3, 3] whose endpoints are a multiple of n~ 100 
(say)) then shows that this is also true uniformly in I as well. 

Remark 6. By using a general result of Costin and Lebowitz 0, one can also 
obtain a central limit theorem for Nj(W n ) as long as I is not too small; see |19j . 
Such results have also been recently been extended to more general Wigner matrices 
in [8]. However, such theorems will not be the focus of the current paper. 



Now we turn from the GUE case to more general Wigner ensembles. There has 
been much interest in recent years in obtaining concentration results for Ni(W n ) 
(and for closely related objects, such as the Stieltjes transform sw n ( z ) ■— \ trace(W„— 
z)^ 1 of W n ) for short intervals I, due to the applicability of such results to establish- 
ing various universality properties of such matrices; see [TTJ[12 , 13, 3Q.l3l ^ [T4 l fT6 l fl7 ] . 
The previous best result in this direction was by Erdos, Yau, and Yin [17] (see also 
[9] for a variant): 

Theorem 7. [17] Let M n be a Wigner matrix obeying Condition CO, and let W n := 
-^M„. Then, for any interval I, one has 

(8) P(|JV>(W B ) - nj^p sc (y) dy\ > T) « exp(-cT c ) 
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for all T > log 



.A log log n 



n, and some constant A > 0. 



Proof. See [T7\ Theorem 2.2]. 



□ 



One can reformulate <j8j) equivalently as the assertion that 



P(\Nj(W n ) - n [ Psc (y) dy\ > T) « exp(log < loglog ") n) exp(-cT c ) 



J i 



for all T > 0. 

In particular, this theorem asserts that with overwhelming probability one has 



J i 

for all intervals I. The proof of the above theorem is somewhat lengthy, requiring 
a delicate analysis of the self-consistent equation of the Stieltjes transform of W n . 

Comparing this result with the previous results for the GUE case, we see that 
there is a loss of a double logarithm log log n in the exponent. The first main result 
of this papei0 is to remove this double logarithmic loss, at least under an additional 
vanishing moment assumption: 

Theorem 8 (First main theorem). Let M n be a Wigner matrix obeying Condition 
CO, and let W n := ^M„. A ssume that M n matches moments with GUE to third 
order off the diagonal (i.e. Re£jj,Im£jj have variance 1/2 and third moment zero). 
Then, for any interval I , one has 



for any T > 0. 

This estimate is phrased for any T, but the bound only becomes non-trivial when 
T log c n for some sufficiently large C. In that regime, we see that this result 
removes the double-logarithmic factor from Theorem [7] In particular, this theorem 
implies that with overwhelming probability one has 



for all intervals /; in particular, for any /, Nj(W n ) has variance 0(log 0(1) n). 
Remark 9. As we are assuming Re(£y) and Im(^) to be independent, the moment 



and ERe(£y ) 3 = EIm(£y) 3 = 0. However, it is possible to extend this theorem to 
the case when the real and imaginary parts of are not independent; see Remark 



'We would like to thank M. Ledoux for a private conversation that led to this question. 
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Remark 10. The constant c in the bound in Theorem |8] is quite decent in several 
cases. For instance, if the atom variables of M n are Bernoulli or have sub-gaussian 
tail, then we can set c = 2/5 — o(l) by optimizing our arguments (details omitted). 
If we assume 4 matching moments rather than 3, then we can set c = 1 (see Remark 
[26f . matching the bound in Corollary [5] It is an interesting question to determine 
the best value of c. The value of c in [16j is implicit and rather small. 

We prove Theorem [8] in Sections [2E1 Our argument is different from that in [17] 
in that it only uses a relatively crude analysis of the self-consistent equation to 
obtain some preliminary bounds on the Stieltjes transform and on Ni (which were 
also essentially implicit in previous literature). Instead, the bulk of the argument 
relies on using the Lindeberg swapping strategy to deduce concentration of Ni(W n ) 
in the non-GUE case from the concentration results in the GUE case provided by 
Corollary [5] In order to keep the error terms in this swapping under control, three 
matching moment^ are needed. 

Very roughly speaking, the main idea of the argument is to show that high mo- 
ments such as 



are quite stable (in a multiplicative sense) if one swaps (the real or imaginary part 
of) one of the entries of W n (and its adjoint) with another random variable that 
matches the moments of the original entry to third order. For technical reasons, 
however, we do not quite manipulate Nj(W n ) directly, but instead work with a 
proxy for this quantity, namely a certain integral of the Stieltjes transform of W n . 
As observed in [16] , the Lindeberg swapping argument is quite simple to implement 
at the level of the Stieltjes transform (due to the simplicity of the resolvent identi- 
ties, when compared against the rather complicated Taylor expansions of individual 
eigenvalues used in |30) ) . 

The result in Theorem[8]is well suited for controlling eigenvalues in the bulk of the 
spectrum, but is not sufficient by itself to control eigenvalues at the edge, and in 
particular the largest eigenvalue Ai(W n ) and the smallest eigenvalue X n (W n ). How- 
ever, it is known that these eigenvalues are highly concentrated around +2 and —2 
respectively. In the GUE case, we have the following concentration result of Ledoux 
[24] and Aubrun [1] (see also [25], [26] for further discussion and refinements): 

Theorem 11 (Concentration for GUE). [24] [1] Let M n be drawn from GUE, let 
W n := -j=M n . Then one has 



■^Compare with the "four moment theorem" from I30| . We need one less moment here because 
we are working at "mesoscopic" scales (in which the number of eigenvalues involved is much larger 
than 1) rather than at "microscopic" scales. However, in Theorem 1141 below, only one eigenvalue 
is involved, making the problem microscopic enough to require four moments instead of three. 




P(n 2 / 3 (Ax(lU„) - 2) > T) « exp(-cT 3 / 2 ) 



for all T > 0. By symmetry, we also have 



P(n 2 / 3 (-A„(W„) — 2) > T) « exp(-cT 3 / 2 ). 
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Remark 12. As is well known, the random variable n. 2 ' 3 (Ai(W n ) — 2) in fact 
converges in distribution to the Tracy- Widom law (34]. However, we will not focus 
on this law here. The exponent 3/2 on the right-hand side cannot be improved 
(indeed, it matches the decay rate of the Tracy- Widom law); see Q] for further 
discussion. 

This result was partially extended to the Wigner case in [17] : 

Theorem 13. [17] Let M n be a Wigner matrix obeying Condition CO, and let 
W n := ^=M„. Then one has 

(9) P(n 2 / 3 (Ai(W„) - 2) > T) « exp(-cT c ) 

for all T > log Aloglog "n, for some A > independent of n. By symmetry, one 
also has 

V{n 2 / 3 {-\ n {W n ) - 2) > T) « exp(-cT c ). 

Proof. See [T7J Theorem 2.1]. □ 

As before, we can reformulate © equivalently as the assertion that 

P(n 2 / 3 (A!(W n ) - 2) > T) < exp(log° (loglog,l) n) exp(-cT c ) 
for all T > 0. 

Our second main result is to remove the double logarithm from Theorem 1131 at 
the cost of requiring matching GUE to fourth order rather than to third order: 

Theorem 14 (Second main theorem). Let M n be a Wigner matrix obeying Condi- 
tion CO, and let W n :— -y=M n . Assume that M n matches moments with GUE to 
fourth order off the diagonal and second order on the diagonal (i.e. a 2 = 1). Then 
one has 

P(n 2 / 3 (Ai(W„) - 2) > T) < exp(-cT c ) 
for any T > 0. By symmetry, one then also has 

P(n 2 / 3 (-A„(W„) - 2) > T) < n°W exp(-cT c ) 

We will derive Theorem ll4l from Theorem llll in Section[5]using the same techniques 
used to derive Theorem |8] from Corollary [5] 

By combining Theorem [8] and Theorem [14] one can "solve" for individual eigen- 
values Xi(W n ) to obtain an appropriate concentration (localization) result: 

Corollary 15 (Concentration of eigenvalues). Let M n be a Wigner matrix obeying 
Condition CO, and let W n :— -^M n . Assume that M n matches moments with 
GUE to three order off the diagonal and second order on the diagonal. Then for 
any min(z, n — i + 1) > log^ loslogn n for some sufficiently large A, we have 

P(n 2/3 min(i, n - i + l) 1/3 \Xi(W n ) - Ji\ > T) < n° (1) exp(-cT c ) 
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for any T > 0, where the classical location 7; G [—2, 2] is defined by the formula 

Hi i 

/ Psc(y) dy = -. 

J-2 n 

If we assume for moment, then the estimate holds for all 1 < i < n. 

The first part of this corollary significantly improves [30] Theorem 29]. (As a 
matter of fact, the original proof of this theorem has a gap in it; see 33, Appendix 
A] for a further discussion.) 

Proof. By Theorems EJ US and the union bound, we see that outside of an event of 
probability exp(— cT c ), we have 

N I = n^p sc {y) dy + 0(T) 

for all intervals /, as well as the bounds 

-2 - 0(n- 2 / 3 T) < X n (W n ) < Ai(W„) < 2 + 0( n -^ 3 T). 

Some elementary estimation of the semicircular density p sc and its integrals Jj p sc (y) dy 
(cf. [HI §5]) then gives 

Xi(W n ) = 7l + 0(n~ 2/3 min(i, n - i + 1)" 1/3 T) 

for all 1 < i < n. The claim follows (possibly after adjusting T by a multiplicative 
factor) . □ 

Remark 16. The results in this paper also hold if one replaces the GUE ensemble 
by the GOE ensemble. To do this, one needs to replace Theorem @] and Theorem 
mi by their GOE counterparts. The GOE version of Theorem 2] was established 
by O'Rourke [29] • The GOE version of Theorem [TT] can be deduced from the 
GUE version (possibly at the expense of worsening the 3/2 exponent) by using 
the connection between the GOE and GUE point processes observed by Forrester 
and Rains [18]; we omit the details. In principle, one might be able to use other 
ensembles (such as the gaussian divisible matrices [22] ) to match moments with, 
which would allow one to remove the moment conditions almost entirely. We will 
not pursue these matters here. 



We thank Michel Ledoux and Atti Knowles for supplying some relevant references. 



2. Reduction to the Stieltjes transform 



We now begin the proof of Theorem [8] The first step is to replace the counting 
function Nj = Nj(W n ) with the Stieltjes transform syy n , defined by the formula 

1 1 " 1 

(10) s Wn (z) := - traced - ^ = - g 
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for any complex number z with positive imaginary part. We can express this 
Sticltjcs transform as a Riemann-Stieltjes integral 

1 f 1 

(11) s Wn ( z ) = ~ dN(-oo tX y 

n Jr x ~ z 

which gives a clear connection between the Stieltjes transform and the counting 
function. Using the heuristic cOV~(_oo,x) rs np sc (x) dx from (f4|), we thus expect to 
have sw„ ~ s sc , where 

s sc (z) := I — - — p S c(x) dx. 
J R x — z 

As is well known, s sc can be evaluated explicitly via contour integration as 

(12) s sc (z) = ^(-z+Vz 2 -4), 



where \J z 1 — 4 is the branch of the square root that is asymptotic to z at infinity. 
In particular, s sc exactly obeys the self-consistent equation 

(13) s sc (z) = 1 

s sc {z) + z 

In the case of GUE, we may easily formalize this heuristic with the assistance of 
Corollary [5l 

Proposition 17 (Concentration for GUE). Let M n be drawn from GUE, and 
W n :— -^M n . Then for any T > and any complex number z = E + \f—\r\ 
with r\ > 0, one has 

P(\s w Jz)~s sc (z)\ > ^l)«n «exp(-cT). 
nr] 

Proof. We may assume that T ^> logn, as the claim is trivial otherwise. Let 
T\ 3> logn be chosen later. From Corollary [5] and the union bound, we see that 
with probability 1 — 0(n°^ exp(— cTi)), one has 



\N!(W n ) - n / p sc (y) dy\ « Tj 



for all intervals / in [—3, 3] whose endpoints are multiples of n 10 °, and hence for 
all intervals I. In particular, 



N{-oo,x) =n p sc {y) dy + 0{T{) 

J — oo 

for all x. On the other hand, from (fTTj) and integration by parts, one has 

If 1 

s wjz) = - I ^^V(^oo,x) dx. 

n Jm. \ x ~ z ) 

A similar integration by parts gives 

Ssc(z) = / t — — r^C / Psc(y) dy) dx, 
Jr \ x — z ) J-oo 

and thus by the triangle inequality 

If 1 

sw n (z) = s sc (z) + 0(- / r^Ti dx). 

n jrt \x — z\ 
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The error term on the right-hand side evaluates to O(jjb). The claim then follows 
by choosing T\ to be a small multiple of T. □ 

We will use this proposition to obtain a similar concentration result for Wigner 
matrices: 

Theorem 18 (Concentration for Wigner). Let M n be a Wigner matrix obeying 
Condition CO, and let W n := -^M n . Assume that M n matches moments with 
GUE to third order off the diagonal. Then for any T > and any complex number 
z = E + \f—lri with E £ [—3, 3] and < r/ <C n 100 , one has 

P(\s Wn (z) ~ s sc (z)\ > — ) « n°( 1 )(exp(-cT c ) +exp(~c(n77) c )). 
nr) 

We prove this theorem in later sections. Let us assume it for now, and use it to 
establish Theorem[5] Let M n , W n , T, K be as in the above theorem. By the triangle 
inequality, we may take I = (— oo, E) for some real number E; from the support of 
p sc , we may assume that E € [—2, 2]. We may also take T ^> log 100 n (say), as the 
claim is trivial otherwise. 

Let T\ ^> T I logn 3> log" n be a quantity to be chosen later, and set 770 := T\/n. 
Applying Theorem [18] and the union bound, we see that outside of an event of 
probability at most 

(14) n 0(1) exp(cTf c ), 
one has 

Ti 



(15) \s Wn (E + V^lr))-s sc (E + V^iri)\<£ 

nr] 

for all 77 between r/o and n 100 which is a multiple of n~ 200 . On the other hand, in 
this range one easily verifies that sw n and s sc are Lipschitz with Lipschitz norm at 
most 0(n 10 ) (say). As a consequence, we conclude (after conditioning outside of 
the above exceptional event) that (|15p holds for all 77 between 770 and n 100 . 

By conditioning on another event of probability at most (|14[) . we may assume that 
all entries of M n are of size at most 0(n) (say). Among other things, this implies 
that all eigenvalues Xi(W n ) are (very crudely) of size at most 0(n 20 ). 

Since 77 > 770 = Ti/n, we conclude from (TT5")) and (Tj"2"|) that 

\s Wn (E + V^It?)! « 1. 
On the other hand, from (fTO)) one has 

n 



Ims 



H', 



1 " 

TI ^ 



n ^ \X t (W n ) - E\ 2 + n 2 
and in particular 

lms Wn (E + V^ln) > —N [E _ E+ ] . 

nr] 
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We conclude that 

(16) N[ E -n,E+ v ] < nr) 

for all i] > i] (note that this claim is trivial for rj > n 100 ). (One could also have 
used Proposition [501 at this juncture.) 

Next, if we integrate (fT5j) and use the triangle inequality we observe that 

(17) Re f s Wn (E + v^Tr?) dr) = Re f s sc {E + V^Ir/) drj + Q( Tll °g n ), 

J i )0 J no n 

Let us now evaluate the left-hand side. From the definition of the Stieltjes trans- 
form, we may rewrite it as 

1 " 

- V Arg(£ + V^lVo - Xi(W n )) - Arg(E + V^ln 100 - \{W n )), 
n '-^ 

i=l 

where Arg is the standard branch of the argument on the upper half-plane. 

Since E £ [-2, 2] and Xi(W n ) = 0{n 20 ), we have 

Arg(£ + v^Tn 100 - XM) = | + 0(n- 50 ) 
(say). Also, from elementary trigonometry one has 

Arg(£ + v^T^o - Xi{W n )) = Klx i{Wn )>E + o(^ 
We may therefore write the left-hand side of (fT7)) as 

i v ' \ n ' 



Xi(W n )-E\+ m 



2 n V n ^\Xi(W n )-E\+ m 



+ 0(n" 5U ). 



On the other hand, from (|16[) and dyadic decomposition (recalling that Xi(W n ) — 
0(n 20 )) one has 



and thus' 



1 r 

\K[W n )-E\ +m = °<* l0g ^ 



Re J s Wn (E + y/=lrj) ^)=\~ \^{-^e) + O (p^^j ■ 
A similar argument gives 

Re/ s sc {E + V^Tv) dv = J tt / p 8C (y)dy + Q( 1 g " ). 



From (IT71) we thus conclude that 



■N(-oo,.E) 



/ Psc (y) dy + OiTxhgn) 

J — oo 



Choosing T\ to be a small multiple of T/logn (and bounding T-f from below by 
T c — O(logn) for some sufficiently small d > 0), we obtain Theorem [5] as desired. 
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It remains to deduce Theorem [TBI from Proposition IT71 This will be the objective 
of the next few sections. 



3. The moment method, and the Lindeberg strategy 

Given a matrix W n = A^M n and a complex number z — E + \J — I77, define the 
quantity A(W„) = A(W n , z) by the formula 

A(W n ) ■= nn(s Wn (z) - s sc (z)). 

This quantity describes the normalised deviation of the Stieltjes transform of W n 
from the semicircular law at z. In this notation, Proposition II 71 becomes the asser- 
tion that 

(18) P(\A(W n )\ > T) < n °M exp(-cT) 

whenever T > 0, E G K, and r] > 0, when M n is drawn from GUE. Similarly, 
Theorem [15] becomes the assertion that 

(19) P(|A(W n )| > T) « n°«(exp(-cT c ) + e^{~c{n V ) c )) 

whenever T > 0, E £ [—3, 3], and < rj <C n 100 , when M n is drawn from a Wigner 
matrix obeying Condition CO, and with Re£jy and Im^ having variance 1/2 and 
third moment zero for 1 < i < j < n. 

To deduce (|19|) from (fT8|) we will use the moment method combined with the 
Lindeberg exchange strategy; more specifically, we will show that a high moment 
~EA(W n ) k for some large even number k (which one should think of, in practice, as 
comparable to T) is stable under the operation of replacing (the real or imaginary 
part of) one entry of M n (and its transpose) with another entry with a number of 
matching moments. The Lindeberg exchange strategy is by now a standard tool in 
establishing universality properties for Wigner matrices |30] , |16j ; the main novelty 
herefl is the application of that strategy to a high moment EA(W n ) k (as opposed 
to a quantity such as EG(A(W n )) for some smooth test function G). 

Let us now make the strategy more precise. Let us call two Wigner matrices 
M n , M' n real- adjacent, or adjacent for short, if their respective atom variables £y 
are equal except for a single choice of = (a,b) and its transpose = 

(6, a), and such that £, a b,C a b either have identical real parts, or identical imaginary 
parts. Thus, a Wigner matrix M' n adjacent to M n is formed by changing the 
real or imaginary part of a single entry of M n and its adjoint, leaving the other 
components of M n unchanged. The main technical step is then to establish the 
following proposition. 

Proposition 19 (Stability of moments). Let M n ,M' n be two adjacent Wigner ma- 
trices obeying Condition CO, whose moments match to order m for some fixed 



Very recently 1231 , a similar application of the Lindeberg exchange strategy to a high moment 
of a spectral statistic was used to establish some related concentration results. We thank Antti 
Knowles for bringing this preprint to our attention. 
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m - 
W n 



O(l). Let z = E + 
= ^M n and W' n := 



(20) 

EA(W n ) k < [ I + () 



1 



-In for some E G [—3,3] and < n <C n , awe? sei 
M„. TTien /or any even integer k > logn, one /ias 

) EA(T<) fe + 0(k) k + 0(n°W exp(-{nn) c )). 



j(m+i)/2 ; ; 



Let us assume this proposition for now and establish TheoremlTSl Let n, M n , W n , E, n, z, T 
be as in that theorem. We may assume that T > log c ° n (say) for some sufficiently 
large absolute constant Co, as the claim is trivial otherwise; we may also assume 
that T < nn, since the claim follows from existing local semicircle laws (in par- 
ticular, Corollary 152"]) . In particular, we may now assume that T < and 
n > log Co n/n. Our task is now to show that 

(21) V(\A{W n )\ > T) < n°« exp(-cT c ). 



On the other hand, if M' is drawn from CUE and W' := -?=M', then from 
Proposition [T7] one has 

P(|^«)| > T) « n°W exp(-cT) 
for all T > 0. In particular, for any fc > logn, one has 

poo 

E\A(W^)\ k = P(\A(W^ l )\>T)kT k - 1 dT 
Jo 

poo 

(22) « fcn°« / e-^T*- 1 dT 

< 0(l) fe n°«fc! 
« 0(k) k 



We can replace M' n with M n in a sequence of n 2 exchanges from one Wigner 
matrix to a real-adjacent one; n 2 — n of these exchanges arise by swapping the real 
or imaginary part of an off-diagonal entry of M' n (and its transpose with 
the corresponding component of M n , and n of these exchanges arise by swapping 
a diagonal entry £a of M' n with the corresponding entry of M n . We perform these 
exchanges in an arbitrary order. By hypothesis, for the n 2 — n off-diagonal exchanges 
one has matching moments to order m — 3, while for the diagonal exchanges one 
has matching moments to order m — 1. Applying Proposition 1191 n 2 — n times 
with m = 3 and n times with m = 1 and concatenating, we conclude that for any 
k > logn one has 

EA(W n ) k < 0{l)EA{W' n ) k + 0(n 2 )0(k) k + 0(n°^ exp(-(nn) c )). 

In particular, from ([22l one has 

EA{W n ) k < 0(k) k + 0(n° (fc) exp(-(m7) c )) 

and hence by Markov's inequality 

?(\A(W n )\ > T) « (W-) k + T- k n°^ exp(-(nn) c ). 
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If we set k to be the largest even integer less than T c ° for some absolute constant 
Co, and if Co is sufficiently large depending on Co, we obtain ([21"]) as desired, thanks 
to the assumptions log c ° n <T <nr\. 

Remark 20. An inspection of the above argument reveals that we in fact have the 
slight refinement 

P(|A(W„)| > T) < n°W exp(-cT) 
in the regime T < (nrj) c , since in this regime we may take k to be a small multiple of 
T (rounded off to the nearest even integer, of course) . Unfortunately, this refinement 
does not appear to immediately offer any significant improvement to the conclusion 
of Theorem [81 

It remains to establish Proposition Q1JJ This will be achieved in the next section. 
4. Stability of high moments 

We now prove Proposition 1191 We introduce a definition: 

Definition 21 (Elementary matrix). An elementary matrix is a matrix which has 
one of the following forms 

(23) V = e a e* a , e a el + e b e* a , V^le a e* b - V^le fe e* 

with 1 < a, b < n distinct, where e\, . . . , e n is the standard basis of C™. 

As M n ,M' n are real-adjacent, one can write 

M n = M° + £V; M' n = M° + gV 

for some elementary matrix V, some random matrix M°, and some real random 
variables £, £' independent of M° that match moments to m th order and obey the 
exponential decay condition 

(24) P(|el>i C ),P(im^)<e-* 
for all t > C and some C, C > 0. 

We now recall some (deterministic) resolvent stability results concerning matrices 
of the form + tV. Define the matrix norm ||i?||(oo,i) of a n x n matrix R = 
(Rij)i<i,j<i by the formula 

11-^11(00,1) : = SUp \Rij\. 

l<i,j<.n 

Proposition 22 (Stability of resolvent). Let M° be a Hermitian matrix, let V be 
an elementary matrix, and let t be a real number. Let z := E + \/— lrj be a complex 
number with n > 0. Write 

R t := (M^ + tV-z)- 1 

and suppose that 

l*lll-Ro||(oo,i) = o(s/n). 

Then 

ll#t||(oo,l) = (l + (l))||i?o||(oo,l). 
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Furthermore, if we set St '■= \ trace Rt, then we have the Taylor expansion 

m 1 

s t = so • » ' 2 <-J> + 0(n-^ +1 V 2 |t 1| i2 min(p || (oO!l) , — )) 

for any fixed nonnegative m = 0(1), where the coefficients Cj are independent oft 
and obey the bounds 

(25) M « PoII^d min(||i?o|| (oo ,i), 
for all 1 < j < m. 

Proof. See Lemma 12] and 021 Proposition 13]. □ 
Our objective is to establish ([20]) . From Corollary [33] we see that 

Pf||(cc,l)=0(l) 

with probability 1 — 0{n ^ exp(— (nrj) c ), while from ([M]) we certainly have £ = 
o(^/n) with 1 — 0(n°^ exp(—(nr]) c ). Hence by Proposition l22l (reversing the roles 
of Rq and R^) we have 

(26) ll#o||(cc,i) =0(1) 

with probability l—Oin ^ exp(— (nry) c ). Using the crude bound A(Wn) = 0(n ^), 
we may thus condition M° to be fixed and obeying (l26l) . since the contribution of 
the event where ([H fails to EA{W n ) k is 0(n 0(fc) exp(-(n?7) c )). 

By Proposition [22] we thus see that whenever £ = 0(^/71), one has 

m 

(27) A(W n ) - A + ]>>,(£/ v 7 ^' + 0((|e|/v^) m+1 ) 

3=1 

where the coefficients Ao,aj are deterministic (and in particular independent of 
£,£'), and aj obeys the bound aj = 0(1). 

Suppose first that |^4ol < k. Then one has 

\A{W n )\ « k 

whenever £ = o(y / n), which gives a net contribution of 0(k) k to E|A(W n )| ; mean- 
while, from (pMj) . the case when 77 3> -y/n contributes at most 0(n ^ exp(— (nrf) c )). 
Thus we may assume that |^4o| > k. Thus we have 

A(W n ) = A (l + b&fy/n)* + 0((£/v^) m+1 ))) 

for some deterministic coefficients b\, . . . ,b m = O(l), and assuming that 77 = o(y / n). 
Raising this to the k th power (after using Taylor's theorem with remainder to 
expand (1 + \x) k to m th order in the regime x = o(l)), we conclude that 

m 

A(w n ) k = A k (i+j2dj^/v^y + om/v^) m+i )) 
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for sonic deterministic coefficients d\ 1 . . . , d m = 0(f) (which are allowed to depend 
on k), whenever £ = o{yfn). Taking (conditional) expectations in 77 (using and 
the trivial bound A(W„) = 0(n ^) to handle the tail event when |^| 3> y/n) we 
conclude that 

m 

E(A(W/„) fe |M°) = A^(l + J2 d J n ' j/2E ^+ ( n ' {m+1)/2 )) + ( nO{k) exp(-(n77) c )). 

3=1 

and thus 

m 

EA(W„) fe = E(4(f+^d,n^'/ 2 Ee+0(n-( m+1 )/ 2 )))+0(n ( fc ) exp(-(m7) c ))+0(fc) fc . 

3=1 

Similarly we have 

m 

EA(I^„) fc = E(^(f+^d,n- J / 2 E(£')''+0(^ (m+1)/2 )))+0(rJ 0(fc) exp(-(nr/) c ))+0(fc) fe . 

3=1 

Since £ and £' match to order k, we obtain the claim. This concludes the proof of 
Proposition \W\ and hence Theorem [5J 

Remark 23. It is possible to adapt the above arguments to the case when Re£ij 
and Im£y are not assumed to be independent. The main new difficulty is that 
instead of swapping the real and imaginary parts of a single entry of M n (and 
its transpose £& ) separately, one has to swap them together. This requires one to 
consider perturbations of the form 

where Vi, Va are two distinct elementary random variables, and £1,^2 are real ran- 
dom variables that are not necessarily independent and obeying the exponential 
decay hypothesis (PM1) . However, it is possible to extend Proposition [25] without 
much difficulty to the case of two-parameter perturbations and perform a similar 
argument to that given above. We omit the details. 

5. Extreme eigenvalues 

We now prove Theorem [T4J by combining the arguments in previous sections 
with some ideas from |I7j (and in particular, demonstrating a concentration of 
Imsw„ (E + V" It)) that is better than 1/7177 f° r some energy E > 2). By symmetry, 
it suffices to prove the bound for Xi(W n ). We may of course assume that n is large. 

By standard large deviation estimates, one has 

P(Ai(W„) > E) < exp(-cn c log£0 

for any E > 3; se^l [THl Lemma 7.2]. This already deals with the case when 
n 2/3 <T < n 100 (say), and the case T > n 100 can be handled by crudely bounding 
^i(W n ) by, say, the Frobenius norm of W n and using Condition CO. Thus we may 
restrict attention to the regime T < n 2 / 3 , and show that 

P(2 + 7tT 2 / 3 T < Xi(W n ) < 3) < 7i° (1) exp(-cT c ). 



One could also use the earlier estimates in 1281 or [2]; see also [3] for more discussion. 
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We may assume that T > log n for some suitably large absolute constant Co , as 
the claim is trivial otherwise. 

Suppose that Ai(W„) was in the interval [2 + n _2 / 3 T, 3]. Set 77 := n~ 2 / 3 , and let 
B(W n ) denote the quantity 

B{W n ) := ni]lms Wn (E + s/^lrj). 

From the identity 

n 2 

<28) B<W "> = £RTCF¥FT7 

we conclude in particular that 

B(W n ) > i 

where E is the closest multiple of n~ 2 / 3 in [2 + n~ 2 / 3 T, 3] to Xi(W n ). Thus, by the 
union bound, it will suffice to show that 

(29) P(B(W n ) > ±) « n°« exp(-cT c ) 

for any fixed E £ [2 + n~ 2 / 3 T, 3]. 

Let be drawn from GUE, and set := -^M' n . By Theorem [TTj we have 

Ai(W^) < 2 + n- 2 / 3 T/2 

outside of an event of probability 0(exp(— cT 3 / 2 )). Also, from Corollary [5] and the 
union bound we see that outside of an event of probability 0(n ^ exp(— cT c )), 
one has 

/ PsM dy + 0(T a l ) 



(say) for all intervals /. In particular, we see that outside of an event of probability 
0(n°W exp(-cT c )), one has 

^[£-71-2/3772, £+71-2/3772] = 

and 

-^[£-2 fc 7i-2/37\.E+2 fe ri2/3T] ^ 2 3k ^ 2 T 3 ^ 2 

for all k > 1. From this, (|28]l . and dyadic decomposition one easily establishes that 

S«) « ^72 

outside of an event of probability 0(n ^ exp(— cT c )). 

Let logn < k < n 01 be an integer to be chosen later. Since we may trivially 
bound lmsw n (E + \/—lrj) by n ot ^ 1 \ we conclude that 

(30) VB{W' n ) k « O (^) ' + exp(-cT c ). 



We claim the following stability result for ~EB(W n ) k , analogous to Proposition [19] 
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Proposition 24 (Stability of moments). Let M n ,M' n be two adjacent Wigner ma- 
trices obeying Condition CO, whose moments match to order m for some fixed m = 
0(1). Set W n := 4jM„ and W' n := 4jM n . Then for any integer logn < k < n 01 , 
one has 
(31) 

EB(W n ) k < (1 + 0{{k/^/r7) m+1 ))VB{W' n ) k + O(100" fe ) + 0(n°^ exp(-cT c )). 

Applying this proposition n 2 — n times with m = 4 and n times with m — 2 we 
conclude that 

VB(W n ) k < (l+O(fcV^ 5/2 ))" 2_ ™(l+O(fc 3 /^ 3/2 ))™(ES(^)' c +O(n o ( 1 )l00- fe )+O(n o ( fc ) exp(-cT c ))) 

and thus (using ([317)) and the hypothesis fc < n 0,01 ) 

E J B(W„) fc < n o(1) 100" fe + exp(-cT c )). 

The desired claim (|29l) then follows from Markov's inequality by taking fc = T c ° 
for some sufhciently small cq > (and assuming Co sufficiently large depending on 
c > 0). 

It remains to establish Proposition [MJ As in the previous section, we write 

M n = M° + £V; M' n = M° + (,'V 

for some elementary matrix V , some random matrix M„, and some real random 
variables £, £' independent of that match moments to m th order and obey the 
exponential decay condition (f24|) . Arguing exactly as before, we may condition M" 
to be a deterministic matrix for which 

11^011(00,1) =0(1). 

Using Proposition [22l as before, we see that 

m 

B(W n ) = Bo + Y / a ^/Vn-y +0((|e|/v^) m+1 ) 
i=i 

for some deterministic coefficients Bq and a., = 0(1), whenever £ = o(y / ?i). 

Suppose first that \B \ < 1/200. Then one has \B(W n )\ < 1/100 whenever £ = 
o( v / n), and so this case contributes O(100~ fc ) + 0(n°W exp(-cn c )) to (JSTJ), which 
is acceptable. Thus we may restrict attention to the case when \Bo\ > 1/200. Then 
we may write 

m 

B(W n ) = B (l + £ b^/Vn-y + 0((|£|/v^) m+1 )) 
3=1 

whenever £ = o(-y/n), where the 6j = 0(1) are deterministic coefficients. 

Suppose now that £ = O(n 3 ). Since fc < n , we may perform a Taylor expan- 
sion of (1 + x) k to order m for a; = O(n _0 2 ) and conclude that 

m 

i?(^„) fc = s fc (i +^ Ci (^/^ + o((fcici/v^r +1 )) 

3=1 



20 



TERENCE TAO AND VAN VU 



in this regime, where the Cj = 0(1) are deterministic coefficients (which are allowed 
to depend in k). Taking expectations as in the preceding section, and using (|24j) to 
handle those £ with |£| > n 3 , we conclude that 

m 

EB(W n ) k = E(5 fc (l+^c 3 Fn- 3 / 2 E?+O((fc/^) ra+1 )))+O(n o(t) exp(-(m7) c ))+O(100- fe ), 

3=1 

and similarly for E£?(W / ^) fe ; and the claim follows from the matching moments 
hypothesis. 

Remark 25. As in Remark [251 it is possible to extend these arguments to the case 
when Re(£jj) and Im(fy) are not independent; we leave the details to the interested 
reader. 

Remark 26. Note that when one has four matching moments rather than three, 
the error terms are more favorable by a factor of y/n, giving some additional room 
to vary the parameters of the argument by small powers of n. Because of this, it is 
possible to modify the proof of Theorem [18] to conclude in this case that 

P(|A(W n )| > T) < exp(-cT) 

in the regime < T < n c for a sufficiently small c. This is achieved by arguing 
as in this section, except that one allows the resolvent ||i?o|| (oo,i) to be as large 
as 0(n c ) rather than O(l) in order to keep the failure probability bounded by 
0(n ^ exp(— n c )) rather than 0(n ^ exp(— (nr]) c )). We omit the details. As a 
consequence, we can sharpen the conclusion of Theorem [5] to 

P(|JV/(W n ) -nj^p sc (y) dy\ > T) « n°« exp(-cT) 

when < T < n c and M n matches moments with GUE to fourth order off the 
diagonal and second order on the diagonal. 



Appendix A. Local semicircle law 



In this appendix we establish some preliminary local semicircle law estimates, 
following the treatment in [16] and [30]. As the methods used here are now standard, 
and the results very close to those in [16] and [30], we shall be somewhat brief in 
our treatment. 

We first recall a concentration estimate of Hanson and Wright [20 . 

Proposition 27 (Concentration of quadratic forms). Let X £ C" be a vector of 
independent random variables £i , . . . , of mean zero and variance a 2 , obeying the 
uniform subexponential decay bound 

P(|^|>^)<e- 4 

for all t > C and 1 < i < n, and some C, C > independent of n. Let A be an 
n x n matrix. Then for any T > 0, one has 

P(\X*AX - a 2 trace A\ > Tcr 2 (trace(A*A)) 1/2 ) < exp(-cT c ). 

Thus 

X* AX = a 2 (trace A + 0{T tmce(A* A)) 1/2 ) 
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outside of an event of probability 0(exp(— cT c )). 



Proof. See [El Lemma B.l]. (Note that a factor of a is missing from the statement 
of the exponential decay hypothesis in the lemma as stated in |16) , which is needed 
in order to reduce to the er = 1 case.) □ 

Corollary 28 (Distance between a random vector and a subspace). Let X and a 
be as in Proposition^^ and let V be a d- dimensional complex subspace of C n . Let 
7Ty be the orthogonal projection to V . Then one has 

0.9da 2 < ||7TypO|| 2 < I.Ida 2 

outside of an event of probability 0(exp(— cd c )). 



Proof. Apply the preceding proposition with A := Try (so traced = trace A* A = d) 
and T := d^/lO. □ 

Remark 29. We can also use Talagrand's inequality as in 301, combining with a 
truncation argument (to bound each entries by some properly chosen quantity K). 
In the case when the atom variables have very fast decay (such as sub-gaussian) or 
bounded (such as Bernoulli), this calculation will actually lead to a decent bound 
on the value of c in Theorem [8j 



We can now establish a crude upper bound on the counting function Ni of a 
Wigner matrix. 

Proposition 30 (Crude upper bound). Let M n be a Wigner matrix obeying Con- 
dition CO, and let W n := ^M n . Then for any interval L, one has 

Ni(W n ) = 0(n\I\) 
outside of an event of probability 0(n ^ exp(— c(n|/|) c ). 



Proof. Fix /, which we write as / = [E — n, E + rj\. Suppose that 
(32) NriWn) > Cnn 

for some sufficiently large absolute constant C to be chosen later. We will show that 
this leads to a contradiction outside of an event of probability 0{n°^ exp(—c(nr]) c ). 

From the identity 

1 " 

lms Wn (E + V^ln) = - V / F|2 — j 

n ^ \Xi{W n ) - E\ z + rf 

and (|32"]) . we see that 

Ims Wn (E + V^ln) > C. 

On the other hand, we can write the Stieltjes transform sw n in terms of the coeffi- 
cients Rij of the resolvent as 

1 " 

s Wn (E + V^lv) = -y^{E + V^ln). 

i=i 
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Thus, by the pigeonhole principle, we have 

ImRaiE + V^v) >C 
for some 1 < i < n. By symmetry (and conceding a factor of n in the failure 
probability estimates) we may take i = n. 

Now, a standard Schur complement computation (see e.g. [30l Lemma 42]) shows 
that 

(33) R{z) nn ' 



i£ nn -z-X*RW{z)X 



where R^ n \z) = (Wn — z) 1 is the resolvent corresponding to the n — 1 x n — 1 
matrix formed by removing the n th row and column from W n , is the 

bottom right entry of M n , and X is the rightmost column of W n (after removing 
the bottom entry -J=£„ n . In particular, using the trivial bound |Im-i| < jj^jy, we 
conclude that 

lmR nn {E + s/^lrj) < - - — — < ' 



T) + 1mX*R( n ){E + y/-Lr))X ~ lmX*R( n )(z)X 
and thus 

lmX*R {n) {E + \^lri)X < C~ l . 

Now, by the Cauchy interlacing law, has ^> Cnr\ consecutive eigenvalues 

in /. There are 0(n 2 ) possibilities for the starting and ending index of these 
eigenvalues. If we let V be the space spanned by the corresponding eigenvectors, 
then dim(V) ^ Cnrj, and from the spectral theorem we see that 

\mX*R^(E + V^lv)X > ||7r y (X)|| 2 /?7 

and thus 

hy(X)\\ 2 «^V- 
On the other hand, from (|2"5)l we see that 

||MX)|| 2 »Cr, 

outside of an event of probability 0(exp(— c(nrf) c )). If C is sufficiently large, the 
claim follows. □ 



This gives rise to a self-consistent equation: 

Proposition 31 (Self-consistent equation). Let M n be a Wigner matrix obeying 
Condition CO, and let W n :— -^M n . Then for any z — E + \J — lrj with E = 0(1) 
and < r\ <C n 100 , and all 1 < i < n, one has 

R{z)u — -j— r ■ ■ TTT 
Sw n (Z) + Z + 0(1) 

outside of an event of probability 0(n O( >> exp(— c(nrf) c )). In particular, by the union 
bound, we have 

(34) s Wn (z)- ' 



sw n (z) + z + o(l) 
outside of an event of probability 0(n ^ exp(— c(nr})°)). 
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Proof. We can assume that nrj > log 100 n (say), as the claim is trivial otherwise. 
By symmetry, it will suffice to establish 

SwA z ) + z + °(1) 

outside of an event of probability 0(n ^ exp(—c(nr]) c )). By (f3"3"]> . this statement 
is equivalent to 

X*R^(z)X - -L{ nn = s w Jz) + o(l). 

By Condition CO, one has -^=^ nn = o(l) outside of an event of probability 0(exp(— a 
which is certainly acceptable; so our task is now to show that 

(35) X*RW(z)X = s Wn (z)+o(l) 

outside of an event of probability 0(n ^ exp(—c(nrj) c )). 

From the Cauchy interlacing law (cf. [30J §5.2]) we know that 
- trace R {n) (z) = s Wn (z) + o(l). 



n 

Also, 



1 



(36) trace i? (n) (z)*i?("Vz) = V - 

By Proposition [3D] and the union bound, we may assume outside of an event of 
probability 0(n ^ exp(—c(nri) c )), one has 

Ni{W n ) « n\I\ 

for all intervals I of width at least 77 centered at E. By interlacing, we may also 
conclude 

Nj{Wi n) ) < n\I\ 
for such intervals. Inserting this bound into (f36|). we conclude that 

(37) traced (z)*R (n) (z) < -. 

77 

If we then apply Proposition [57] with T := (nrj) 1 / 4 (say), using the hypothesis that 
nrj > log 100 73, (so that l/(nr]) c = o(l) for any c > 0) we conclude ([35]) outside of an 
event of order 0(n ^ exp(— c(nrj) c )), as required. □ 



We can combine this proposition with a standard stability analysis of the self- 
consistent equation (|34|) to conclude a crude version of the local semicircle law: 



Corollary 32 (Local semicircle law). Let M n be a Wigner matrix obeying Condi- 
tion CO, and let W„ :— -^M n . Then for any z = E + \f~-\r\ with E = 0(1) and 
< rj -C 7i 100 , and all 1 < i < n, one has 

(38) s w „(z) = s BC {z) + o(l) 
and 

(39) R(z) u = s sc (z) + o(l) 
outside of an event with probability 0(n O( >> exp(— c(nr/) c )). 



24 



TERENCE TAO AND VAN VU 



We note that this corollary is essentially [TT1 Theorem 3.1]; in the statement of 
the result in [17] the additional constraint r\ > log c log log n jn for some constant C 
is imposed, but this constraint is not actually used in the proof, at least if one is 
not concerned with obtaining the best possible bounds for the o(l) error terms. For 
the convenience of the reader, we sketch the proof of this corollary below. 

Proof. As before we may assume that r\ > log 100 n/n; we may also assume that n 
is large. By Proposition |3~T1 we may assume that (l3"4"]l holds. 

Let us first dispose of the case when rj is large, say 77 > 100. In this case, the 
imaginary part of s\y n (z) + z + oil) is at least 100 — o(l), and hence by (|34|) one has 
I s vk„ (-s) I < 1/100 + o(l); inserting this back into (|34|) (and using dT2|) ) one obtains 
\s Wn {z) - s sc {z)\ < 1/10 (say). One can then deduce fl38]) from (and ([13])) by 
a routine application of the contraction mapping theorem. 

Henceforth we assume that r\ < 100, so that z — 0(1). Then equation (13"4l already 
implies that sw n {z) = O(l), since ([3~4"]l cannot hold if |swn(#)| is too large. We may 
thus multiply out the denominator and conclude that 

sw n {z) 2 + zsw n (z) + 1 = o(l). 

Since the two solutions to the quadratic equation s 2 + zs + 1 = arc s = s sc {z) 
and s = —z — s sc (z), we conclude that 

s W n (z) = s sc (z) + o(l) or s Wn (z) = -z - s sc (z) + o(l) 

outside of an event with probability 0(n°^ exp(—c(nrj) c )). 

We apply this fact with z replaced by an arbitrary complex numbers £ with 
Re(C) = O(l) and rj < Im(£) <C 1, and whose real and imaginary parts are multi- 
ples of n~ 100 (say). By the union bound, the probability of the failure event is still 
0(n°W exp(-c(m?) c )). We may then remove the latter hypotheses using the fact 
that Sw n and s sc have Lipschitz constant 0(n) in this region, and conclude that 
outside of an event of probability 0{n°^ exp(—c(nri) c )), one has 

(40) swJO = s sc (0 +o(l) or s Wn (0 = -C - s sc (0 + o(l) 

for all ( with Re(C) = O(l) and r\ < Im(C) <C 1. On the other hand, if one has 
Im(C) ^ c f° r some absolute constant c > 0, then the second possibility in (|4"0"]) 
cannot occur for n large enough, because sw„ (C) necessarily has positive imaginary 
part. A continuity argument then shows that the first option in (14TJ1) holds for all 
C in the indicated regiorQ. This gives (|38[) . Among other things, this shows that 
\sw n {z) + z\ 1 (thanks to (flB"]) 1 ). and then from ([T3]) and the second part of 
Proposition [31] we obtain ([39]) . □ 

For our applications, we will also need bounds on the coefficient norm 

ll#0)||(oo,i) := sup \R{z)ij\ 



"When f approaches the edges ±2 of the spectrum, thus £ = ±2 + o(l), the two options in 
1 140 I I begin to overlap, but in that regime one can deduce the first option from the second (with a 
slightly worse o(l) error) and so the claim made in the text is still valid. 
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of the resolvent. 

Corollary 33 (Resolvent bound). Let M n be a Wigner matrix obeying Condition 
CO, and let W n := A^M n . Then for any z = E + \J — 177 with E = 0(1) and 

< 77 <C ra 100 , one has 

(41) 11^)11(^=0(1) 
outside of an event with probability 0(n°^' exp(— c(ni]) c )). 



Proof. Again, we may assume r\ > log 100 n/n. By the union bound, it suffices to 
show for each 1 < i, j < n that 

\R(z) ij \ = 0(l) 

outside of an event with probability 0(n ^ exp(—c(ni]) c )). In the diagonal case 
i = j, this follows directly from (|3T))) . so suppose that i 7^ j. In this case, we may 
use the Schur complement identity 

R(z) i5 = -R{z) ii R®(z) jj K$ ) 

where iiW (£) is the resolvent associated to the n-lxn-1 matrix W n formed 
by removing the i th row and column from W n , and is the quantity 

Kf = ^=C ij -Xm^-z)^X h 
(a) 

Qj is the ij coefficient of W n , W n is the n — 2 x n — 2 matrix formed by removing 
the i th and j th rows and columns from W n , and Xi, Xj G C™ -2 are the i" 1 and j th 
columns of W n , after removing the i th and j" 1 rows. See [TH1 Lemma 4.2] for a proof 
of this identity. From (13T)|) applied to both the original Wigner matrix W„ and the 
minor (which is essentially also a Wigner matrix, up to an easily manageable 
multiplicative factor of v/ ^ 1 ) we see that R(z)u — 0(1) and R^(z)jj — 0(1) 
outside of an event of probability 0(n ^ exp(— c(mf) c )), so it suffices to obtain 
the bound ify = 0(1) outside of a similar event. But from Condition CO, one 
has -j^Cij — O(l) outside of an event of probability 0(exp(— n c )), which is certainly 
acceptable, so it suffices to show that 

X*(W^-z)- 1 X j = 0(l) 

outside of an event of probability 0(n o ^' exp(— c(nrj) c )). But by Proposition |2"T1 
(viewing the n — 2 x n — 2 matrix (Wn — z)^ 1 as the upper-right block of a 
nilpotent 2(n — 2) x 2(n — 2) matrix, and concatenating Xi and Xj together), one 
has 

X*(W^ z)- l X 3 = 0(^T(trace(((Wfe) - z)" 1 )*^ - z)- 1 )) 1 ' 2 ) 

outside of an event of probability 0(exp(— cT c )), for any T > 0. But by repeating 
the derivation of (13"T1) . one has 

trace(((W«> - z)" 1 )*^ - z)" 1 )) - O(-). 

If one then sets T — 0(y/nrj), one obtains the claim. □ 
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We remark that the above argument in fact shows that we may improve the 
bound R(z)ij — 0(1) to R{z)ij — 0( ^ nri y /2 -s ) f° r an Y fixed S > 0; compare with 
|17i Theorem 3.1]. However, we will not need this improvement here. 
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